Advancing molecular modeling and reverse vaccinology in broad-spectrum yellow fever virus vaccine development

Yellow fever outbreaks are prevalent, particularly in endemic regions. Given the lack of an established treatment for this disease, significant attention has been directed toward managing this arbovirus. In response, we developed a multiepitope vaccine designed to elicit an immune response, utilizing advanced immunoinformatic and molecular modeling techniques. To achieve this, we predicted B- and T-cell epitopes using the sequences from all structural (E, prM, and C) and nonstructural proteins of 196 YFV strains. Through comprehensive analysis, we identified 10 cytotoxic T-lymphocyte (CTL) and 5T-helper (Th) epitopes that exhibited overlap with B-lymphocyte epitopes. These epitopes were further evaluated for their affinity to a wide range of human leukocyte antigen system alleles and were rigorously tested for antigenicity, immunogenicity, allergenicity, toxicity, and conservation. These epitopes were linked to an adjuvant (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β-defensin) and to each other using ligands, resulting in a vaccine sequence with appropriate physicochemical properties. The 3D structure of this sequence was created, improved, and quality checked; then it was anchored to the Toll-like receptor. Molecular Dynamics and Quantum Mechanics/Molecular Mechanics simulations were employed to enhance the accuracy of docking calculations, with the QM portion of the simulations carried out utilizing the density functional theory formalism. Moreover, the inoculation model was able to provide an optimal codon sequence that was inserted into the pET-28a( +) vector for in silico cloning and could even stimulate highly relevant humoral and cellular immunological responses. Overall, these results suggest that the designed multi-epitope vaccine can serve as prophylaxis against the yellow fever virus.


Prediction of T-cell epitopes
Firstly, the primary sequences of the structural (C, M, and E) and nonstructural (NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5) proteins of the yellow fever virus (YFV) were obtained from the Virus Pathogen Resource (ViPR) database (https:// www.viprb rc.org/ brc/ home.spg? decor ator= vipr) 28 .The data were meticulously filtered according to specific criteria, including Family Flaviviridae, Genus Flavivirus, Species Yellow fever virus, and Global geographic group, leading to a preliminary selection of 385 sequences from the ViPR database; this was further refined to 196 sequences for detailed investigation based on additional criteria of 'genome complete' and 'human host group' .This comprehensive analysis encompassed sequences from various geographical regions, including Africa (27), Asia (9), Europe (28), North America (7), and South America (125), thus ensuring a globally representative and diverse dataset for our yellow fever virus research.To achieve our research objectives, the proteomes of these 196 viral strains were meticulously compiled, categorized by protein type, and systematically aligned to derive consensus sequences.
We conducted MHC class I restricted (CTL) epitope prediction using two online servers: ProPred I (http:// crdd.osdd.net/ ragha va/ propr ed1) and NetCTL (http:// www.cbs.dtu.dk/ servi ces/ NetCTL/).ProPred I identifies promiscuous regions in protein sequences by employing matrices for 47 MHC-I alleles and models for proteasomal and immunoproteasome processing 33 .The analysis was carried out with specific parameters, including a 4% threshold, and the activation of proteasome and immunoproteasome filters, each set at a threshold of 5%, as referenced in 34,35 .
For validation of the identified epitopes in ProPred, we utilized NetCTL.NetCTL not only predicts potential epitopes within protein sequences for cytotoxic T lymphocytes (CTL) but also provides information regarding proteasomal C-terminal cleavage through artificial neural networks and TAP transport efficiency through weight matrix calculations 36 .The thresholds for CTL epitope identification, C-terminal cleavage, and TAP transport efficiency were set at 0.75, 0.15, and 0.05, respectively.
Similarly, two distinct methods were employed to assess the HTL epitopes capable of binding to HLA-DQ, HLA-DP, and HLA-DR alleles, employing artificial neural networks.The IEDB tool (http:// tools.iedb.org/ mhcii/) utilizes a big dataset including over 10,000 unpublished MHC-peptide binding affinities, 29 peptide/MHC crystal structures, and 664 peptides that have been experimentally tested 29 .To validate the forecasts generated by IEDB, the protein sequences have been uploaded to the NetMHCIIpan database., which is accessible at http:// www.cbs.dtu.dk/ servi ces/ NetMH CIIpan/.This server utilizes a vast dataset that includes more than 100,000 quantitative measurements of peptide binding sourced from IEDB.It covers a wide range of molecules, including HLA-DR, HLA-DQ, HLA-DP and even mouse MHC-II molecules, namely 36 HLA-DR, 27 HLA-DQ, 9 HLA-DP and 8 mouse MHC-II molecules 38 .www.nature.com/scientificreports/

Prediction of B-cell epitopes
The "Bepipred Linear Epitope Prediction 2.0" method, available through the IEDB tool at http:// tools.iedb.org/ bcell/, was employed to enhance the accuracy of predicting B lymphocyte (BL) epitopes within protein sequences."Prediction 2.0" was specifically developed to address limitations observed in other epitope prediction tools, which predict continuous epitopes through a random forest algorithm trained on epitopes annotated from antibody antigen protein structures 30 .The standard threshold of 0.5 for predictions was maintained.

Antigenicity prediction
The theoretical epitopes underwent an antigenicity evaluation step in which they were individually entered into VaxiJen 2.0 (http:// www.ddg-pharm fac.net/ vaxij en/ VaxiJ en/ VaxiJ en.html).This software assesses antigenicity, taking into account the selection of a target organism, such as a virus, bacterium, tumor, parasite, or fungus.In this analysis, a threshold of 0.5 was adopted, as this value aligns with the peak accuracy for most of the models used 31 .

Allergenicity prediction
The allergic or nonallergic nature of possible epitopes was predicted using the AllerTop 2.0 server (http:// www.ddg-pharm fac.net/ Aller TOP/).This online server evaluates the similarities between the peptide sequence under study and the sequences in its database.The epitopes are individually assessed, and the outcome, indicating whether the sequence is likely allergenic or non-allergenic, is furnished.Additionally, a link to the protein with a similar sequence is provided.

Immunogenicity prediction
Immunogenicity scores of CTL epitopes were calculated using IEDB immunogenicity (http:// tools.iedb.org/ immunogenicity/).This tool collects the most important variables affecting immunogenicity, such as the P4-6 position of a peptide and amino acids with large and aromatic side chains 32 .The masking option used was the default (1, 2, and c-terminal) and the cutoff was set to zero 33 .

Toxicity prediction
To guarantee that the chosen epitopes were not toxic, the ToxinPred web server (http:// www.imtech.res.in/ ragha va/ toxin pred/) was used.To predict the toxicity of the epitopes, the server is based on the physicochemical characteristics of the peptides using machine learning and a quantitative matrix.The database of this method includes 1,805 toxic and 3,593 non-toxic peptides 34 .

Conservation analysis
To gauge the extent of epitope conservation within the acquired protein sequences at varying levels of identity, the IEDB conservation tool, accessible at http:// tools.iedb.org/ conse rvancy 44 , was utilized.The extent of conservation is measured by the percentage of protein sequences with which the epitope is identical at a given level of similarity.This approach allows for the selection of broadly protective epitopes.

Population coverage analysis
Population coverage provides a direct indication of vaccine efficacy in different geographic regions by examining the prevalence of human leukocyte antigen (HLA) alleles associated with the epitope of interest.For this study, the selected epitopes, along with their respective HLA-binding alleles, were submitted to the IEDB population coverage tool (http:// tools.iedb.org/ conse rvancy) 35 .This tool was programmed for the principal endemic regions of yellow fever-South America and Africa.

Multi-epitope vaccine construction
The vaccine sequence was constructed using LCTL and HTL epitopes with sequences overlapping with the BL epitopes that passed all immunoinformatic analyses.These sequences were connected using AAY and GPGPG linkers, respectively 36 .The AAY peptide linker helps the epitopes generate suitable sites for binding to the TAP transporter and enhances epitope presentation, whereas the GPGPG linker stimulates TCD4 + responses and preserves the conformational immunogenicity of the helpers as well as the antibody epitopes 37 .
A β-defensin adjuvant sequence was added to the N-terminus of the multi-epitope vaccine using the linker EAAAK, thus enhancing its immunogenicity.The β-defensin induces recruitment of naive T cells and immature dendritic cells by contacting TLR and CCR 6 (chemokine receptor 6) receptors at the site of infection 38 , and the linker EAAAK reduces association with other protein domains with efficient detachment and increases stability 39 .

Assessment physicochemical properties of the vaccine prototype
To analyze the physicochemical characteristics of the vaccine model, such as: To obtain information on molecular weight, pI (isoelectric point), half-life, instability index, aliphatic index, and GRAVY (Grand Average Hydropathy) of the vaccine sequence, it was subjected to analysis using the ExPASy-ProtParam tool, available at http:// web.expasy.org/ protp aram/.These analyses leveraged a database of proteins with well-established properties, which were used as reference parameters for the provided protein sequence 40 .

Design, refinement and validation of the tertiary structure of the vaccine prototype
The Raptor-X server, accessible at http:// rapto rx.uchic ago.edu/, was used to predict the three-dimensional structure of the vaccine sequence.This procedure consists of taking an input sequence in FASTA format and www.nature.com/scientificreports/implementing three techniques: single and multiple template threading, along with prediction of alignment quality 7 .To gauge the precision of the projected 3D structure, the online website is the most trusted sources, which encompass the P-value for relative global quality, as well as GDT (Global Distance Test) and uGDT (un-normalized GDT) for assessing the overall structural integrity 52 .The refinement of the tertiary structure was carried out through the GalaxyRefine 2 server, which is available at http:// galaxy.seokl ab.org/ cgi-bin/ submit.cgi? type= REFIN E2.This web server employs a specialized approach for refining 3D structures by implementing short Molecular Dynamics simulations after repeated perturbations involving side-chain repacking at both global and local levels.This method enables more extensive structural adjustments 53 .It incorporates multiple local and global move sets and iteratively accumulates conformational changes, facilitating larger-scale modifications.The local and global move sets use an estimated structure error to concentrate refinement efforts in regions with greater inaccuracies.
Finally, the quality and potential errors in the 3D model were checked using MolProbity, Swiss Model, ProSAweb, ERRAT, and Verify3D.For structural validation, the Swiss Model's Structure Assessment Tool, available at https:// swiss model.expasy.org/ assess, was employed to obtain information on both the global and local aspects of the structure.This tool utilizes its own methods, including QMEAN and Ramachandran plot analysis, and can also run additional software tools like MolProbity.In addition, the ProSA web server, accessible at https:// prosa.servi ces.come.sbg.ac.at/ prosa.php, was included in the analysis to validate the structural quality, and a quality score (Z-score) for the input structure was calculated.Score values that fall outside a typical range for native proteins suggest that there are likely errors in the structure 41 .Another validation server, known as ERRAT (http:// servi ces.mbi.ucla.edu/ ERRAT/) 42 , assessed disjoint interactions within the framework.The accuracy of the 3D model's design was assessed using Verify3D, available at https:// servi cesn.mbi.ucla.edu/ Verif y3D/.This tool gauges the compatibility of the model with its corresponding amino acid sequence, providing insights into the quality and reliability of the structural model 43 .

Molecular docking simulations and refinement
Binding of the vaccine to the appropriate immunological recipient is paramount to elicit an appropriate immune response 44,45 .Toll-like receptors (TLR) are members of a family of pattern recognition receptors that recognize products of various microorganisms 46 .In the recognition of YFV by the host immune system, TLR-2 along with three other Toll-like receptors (7, 8, and 9) are described to be crucial for the interactions between 17D vaccine and human cells stimulating a mixed Th2 and Th1 cell profile 47 .
Therefore, the vaccinal model was linked to the TLR-2 receptor (PDB ID: 2z7x) using PatchDock server (http:// bioin fo3d.cs.tau.ac.il/ Patch Dock/).PatchDock 48 is a geometry-based molecular docking algorithm.This program, upon inputting two molecules, segments them into patches based on their surface shape, effectively dividing them into patterns akin to visually distinguishable puzzle pieces.The algorithm entails several key steps: (a) the representation of molecular shape, (b) matching surface patches, (c) filtering and evaluation.These patches can be overlaid using shape matching algorithms to facilitate the comparison and analysis of the two molecular structures 49 .
The FireDock web tool (http:// bioin fo3d.cs.tau.ac.il/ FireD ock/) 50 was used to optimize and re-evaluate the rigid-body molecular docking solutions.The final 10 models are categorized based on a general energy that includes atomic contact energy as well as van der Waals interaction, partial electrostatics, and binding energy estimates.The most promising Firedock model underwent further refinement within the HADDOCK interface, which can be accessed at https:// bianca.scien ce.uu.nl/ haddo ck2.4/ refin ement/1.This server offers a list of clusters ranked by score and provides comprehensive statistics, including the average score for the top four structures within each cluster.This step allows for a more in-depth analysis and selection of refined structures based on their quality and score 51 .
Molecular dynamics simulation (MD) is an important technique for analyzing the strength of the receptorligand complex.It was used with the WEBGRO for Macromolecular Simulations (https:// simlab.uams.edu/) to investigate the binding stability of the final complex 52 .In the simulation of the TLR2-vaccine complex, a 50 ns timeframe was employed.The GROMOS96 43a1 force field parameters were used for the simulation.The entire system was solvated in water, neutralized to balance the charges, and supplemented with 0.15 M NaCl salt to mimic physiological conditions.Key simulation parameters monitored during this process included the Root Mean Square Deviation (RMSD) and the Root Mean Square Fluctuation (RMSF).These parameters are fundamental for assessing the stability and dynamics of the system throughout the simulation.
To identify the most pertinent vaccine-TLR2 complex resulting from the docking calculations, we employed the combined quantum mechanics/molecular mechanics technique (QM/MM).This approach combines quantum mechanics for the ligand-receptor interactions with molecular mechanics for the surrounding environment, allowing for a more accurate representation of the system's behavior.QM/MM methods have solidified their position as advanced computational tools for studying biomolecular systems, as evidenced by the increasing number of publications utilizing these methods 20,26,53,54 .this procedure consists of taking an input sequence in FASTA format and implementing three techniques: single and multiple template threading, along with prediction of alignment quality 7 .To gauge the precision of the projected 3D structure, the server offers confidence scores, which encompass the P-value for relative global quality, as well as GDT (Global Distance Test) and uGDT (un-normalized GDT) for assessing the overall structural integrity 52 .The ONIOM multilayer technique, a unified strategy accessible in the Gaussian code, was used to carry out the QM/MM optimization.Ab initio calculations of the total energy of large complexes, such as biological systems, are possible and accurate using this approach when the systems have been divided into two or three layers.The TLR-2 receptor was placed in the MM layer, while the vaccine's main residues of amino acids were assigned to the QM layer.
To enlarge the electronic orbitals in the QM layer, we used the well-known B3LYP hybrid functional (Becke, three parameters, Lee-Yang-Parr) for exchange-correlation in conjunction with the 6-311G (d, p) basis set.Notably, during the geometry optimization process, all amino acid residues within a 6.0 Å radius from the ligand's centroid were allowed to adjust their positions [55][56][57] .This approach facilitates the accurate exploration of the ligand-receptor interactions and structural changes within the specified region of the complex 58 .
The best PatchDock + FireDock + HADDOCK (PFH) and PatchDock + FireDock + HADDOCK + MD + QM/ MM (PFHMQM) models were operated in the PRODIGY prediction for a comparative analysis of binding energies.The PRODIGY forecast protein-protein binding affinity (or binding free energy) on the basis of the biological system's structure and function, i.e., the interfacial contact network 59 .RMSD analysis in Discovery studio compared structures to original PatchDock complexes, revealing structural disparities 60 .

Codon adaptation and in silico cloning
Codon adaptation according to the host microorganism to be used is a very important step for in silico cloning.For this purpose, we used the Java Codon Adaptation Tool (http:// www.jcat.de/), which specializes in predicting an optimized coding sequence for each input sequence (DNA or protein).Its result output includes the optimized gene sequence along with its codon adaptive index (CAI) and the percentage of CG content 61 .In this step, E. coli k12 was considered, which is widely used as a host microorganism.In addition, three criteria were selected (a) avoidance of rho-independent transcription terminators, (b) avoidance of prokaryotic ribosome binding sites, (c) avoidance of restriction enzyme cleavage sites.This aligned sequence was inverted using the IUPAC convention (https:// arep.med.harva rd.edu/ labgc/ adnan/ proje cts/ Utili ties/ revco mp.html) to show complementarity with the replication cycle of its vector.The restriction sites XhoI and BamHI were added to the N-terminus and C-terminus of the optimized reverse cDNA sequence.This resulting sequence was inserted into the pET-28a( +) vector using SnapGene v4.2 software (http:// www.snapg ene.com) 62 for subsequent in silico cloning.

Immune response simulation
The immunogenicity and immune response of the vaccine construct were assessed using the C-ImmSim service (https:// kraken.iac.rm.cnr.it/C-IMMSIM/), which combines molecular biology approaches with data-driven prediction methods to provide a comprehensive profile 63 .The program was adjusted so that the period between injection doses is approximately one month, which equates to (84 time steps), and the simulation steps were set to one thousand, while all other stimulation parameters were left at their default values.

Results and discussion
Vaccines are the best strategy to prevent infectious diseases by generating protective immunity.Conventional vaccines are used worldwide and are considered the best method for treating various diseases.However, new vaccination tactics are required immediately to address the problems associated with live or attenuated vaccines (see our introductory section).For example, multiepitope-based vaccines produced by reverse vaccinology techniques are harmless, more stable, and easier to produce than attenuated viral vaccines.In addition, they are recommended primarily for their cost-effectiveness and higher efficacy 64,65 .
The importance of computational methods in the development of these vaccines is growing.Current approaches in molecular modeling, bioinformatics, and immunoinformatic have accelerated the production process and enabled screening of genomes to identify potential vaccine candidates and develop multiepitope vaccines with higher efficacy.This technology has evolved to identify viral proteome areas that are potentially capable of activating innate and adaptive immune responses to induce protective memory.Therefore, it has been used in several studies with pathogenic microorganisms such as SARS-CoV-2 20 and other arboviruses such as dengue virus 44 , Burkholderia 64 , Chikungunya virus 24 , and Mayaro virus 25,26 .Recent studies in animal models have shown excellent results for multiepitope vaccines, suggesting that this platform is a promising and safe method compared with attenuated vaccines 66 .Interestingly, the first candidate vaccine against malaria to progress to phase III clinical trials is the MosquirixTM, which comprises contiguous epitopes derived from the circumsporozoite protein of Plasmodium falciparum 67 .
In the present research, we proposed a multi-epitope vaccine against Yellow fever virus based on a robust methodology.The results will be presented and discussed below.

Acquisition of viral protein sequences
In our quest to explore antigenic epitopes for the development of an effective Yellow Fever Virus (YFV) vaccine, we conducted a thorough examination of validated data from the ViPR database.This investigation yielded a total of 5, 5, 3, 9, 10, 10, 6, 10, 7, 5, and 4 protein sequences for the proteins E, C, M, NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5, respectively, within the YFV.All these proteins were taken from a total of 196 complete genomes, included sequences from different geographical regions such as Africa (27), Asia (9), Europe (28), North America (7) and South America (125).The consensus sequences of these proteins were used to predict the B and T cell epitopes for the design of the multi-epitope vaccine.Some studies concluded that the envelope protein is critical for eliciting a strong humoral and cellular adaptative immune response against YFV 68,69 .Analysis of most antigenic regions of this protein revealed a total of 13 epitopes with possible affinity for MHC-I, but epitopes E 40−48 , E 237−245 , E 265−273 , and E 334−342 did not have binding affinity to at least 5 HLA alleles, so they were discarded from our analysis.Of the remaining 9 epitopes, 6 had no antigenicity values above 0.5, and E 234−242 was classified as a probable allergen.E 239−247 and E 284−292 had negative immunogenicity values and were also discarded (Table 2).
As we saw here, none of these peptides matched our predicted peptides.Using a more robust and sophisticated methodology, we excluded E 72−8770,71 E 329−343 , and E 337−35170 epitopes because they had very low antigenicity values 31 .In addition, we discarded E 284−29272 because it had an immunogenicity score (i.e., a T-cell recognition score) of -0.19211, indicating that the peptide-MHC-I complex formed by this epitope is theoretically not immunogenic in humans 32 .The selection of such epitopes (and many others) for the construction of a vaccine would not confer immunogenic power to it, which could logically be confirmed only by experimental testing.
It is essential to stress that we configure and parameterize the tools we use for highly sensitive analysis.Although we found a lower number of epitopes than other authors, we guarantee high confidence in the regions of the YFV proteome selected for the composition of our vaccine prototype by avoiding false positives.
The promiscuous epitope M 57−65 , which belonged to 11 HLA class I alleles, had an antigenicity of 0.5028 but was noted as a possible allergen in the allergenicity analysis.M 39−47 , M 41−49 and M 60−68 had satisfactory results in all analyses.However, only M 60−68 , which was predicted for 14 alleles, was conserved in all M protein sequences (Table 3).www.nature.com/scientificreports/As happened with protein E, the epitopes of protein C did not pass through the analyses to which they were subjected, C 3−11 , 74−82 C 77−85 , not associated with the number of alleles required.Epitopes C 73−81 , C 80−88 , C 83−91 obtained antigenicity values below 0.5, remaining only C 70−78 , which presented as a possible allergen (Table 4).
Only 2 epitopes were identified in the NS1 protein, NS1 34−42 and NS1 176−184 , which were associated with 11 and 12 alleles, respectively, but the antigenicity values were less than 0.5, suggesting nonantigenic sequences, so they were not considered for further testing (Table 5).
Table 9 shows that the peptide sequence NS4A 48−56 was not considered for antigenicity analysis because it has binding affinity for only 4 HLA alleles.Only NS4A 4−12 and NS4A 106−114 were considered antigenic, but the immunogenicity value of the first peptide was significantly low.Therefore, only the NS4A 106−114 epitope of NS4A protein was further analyzed and allowed in all phases.
NS4B 21−29 binds to 1 HLA class I allele, and the others are linked to ≥ 5 alleles.NS4B 167−175 was determined to be antigenic and not-allergic, but had negative immunogenicity.Of the other 6 epitopes of NS4B protein that were classified as antigenic, non-allergic, only NS4B 199−207 was found to be immunogenic, non-toxic and 100% conserved.(Table 10).
Originally, many epitopes were predicted for the NS5 protein, but none of the epitopes that were able to bind to the required number of alleles passed all tests (Table 11).
The identification of epitopes that can be recognized by TCD8 + lymphocytes is very important in the cellular immune responses against intracellular microorganisms, such as viruses 46 .So, we hope that these sequences can activate this response and the infected cells can be destroyed.

BL epitope selection
The adaptive humoral immune response is one of the most sought after effects when it comes to vaccines.T-helper lymphocytes are crucial in the differentiation of B-lymphocytes.For this reason, finding LB epitopes that overlap with HTL epitopes ensures protection for different cell types working together.Here, a total of 98 BL epitopes, of varying size, were identified initially using the IEDB server.Of these, only 11 had a conservation of ≥ 90% and were used for sequence overlap analysis with HTL epitopes, in which only 5 ( NS3 277−281 , NS3 443−458 , NS3 458−471 and NS5 401−409 ) found matching sequences (Table 13).
Recently, Tosta et al. 73 found 28 CTL (06 HTL) epitopes overlapping with B-cell epitopes, including three (one) from the envelope, two (one) from the capsid, two from prM, four (one) from NS1, two (one) from NS2A, five from NS3, one (two) from NS4B, nine from NS5 73 .As mentioned earlier, no epitope matches our results, as we are extremely careful to avoid false positives as much as possible.
In our study, significant progress was made in immunoinformatic analysis, highlighting crucial aspects in epitope selection for the development of vaccines against flaviviruses such as YFV and Zika virus.Through comprehensive immunoinformatic analyzes, we were able to identify important epitopes in the NS3 and NS5 proteins of these viruses.The identified epitopes "PTRVV" (277-PTRVV) and "YMWLGARYL" (477-YMWLGARYL) can be compared to the promiscuous T-cell and B-cell epitopes predicted for Zika virus www.nature.com/scientificreports/by Dar et al. 74 .The identification of overlapping or similar epitopes in different flaviviruses such as Zika and YFV emphasizes the potential for cross-reactivity or shared immunological features between these viruses.However, "YMWLGARYL" shows intriguing complexity in epitope selection as it fails in allergenicity assays.This finding underscores the need to consider a range of factors, including allergenicity, when developing vaccines.

Population coverage
As we have seen so far, one of the most important concerns in vaccine development is the efficacy of the vaccine considering regional populations in the case of endemic diseases.Our method aims to predict overlapping epitopes between CTL, HTL, and B cells that can be recognized by the most common HLA alleles in Africa and South American populations and thus can induce both humoral and cellular immune responses.

Vaccine sequence construction
Having identified the immunogenic, nonallergenic, nontoxic, conserved epitopes of the YFV proteome capable of binding to a substantial number of alleles of the HLA system, we propose the construction of a vaccine prototype using these epitopes.Precisely because we know that vaccines based on classical vaccine platforms can cause allergic reactions because they are formulated with multiple proteins of the target organism 75 , we are interested in developing a prototype with epitopes that have the most important properties of a vaccine candidate, including minimal allergic reactions, toxicity, and side effects.In addition, we used the linkers in our multi-epitope vaccine to reduce (improve) the likelihood of misfolding of the fusion epitopes and low yield in vaccine production (folding, stability, and flexibility or stiffness of the designed chimeric vaccine candidate) 76 .At the N-terminus of our vaccine sequence, the apeptide adjuvant β-defensin was added to ensure high antigenicity and enhance the immunological response 38 .
Thus, our prototype vaccine consists of 10 CTL and 5 HTL with overlapping LB epitopes that are bonded together with the adjuvant beta-defensin, yielding a subunit vaccine with a total of 264 amino acids (Fig. 1).In general, the development of a vaccine is quite tedious and takes a long time.However, the early stages of this work demonstrate how immunoinformatics helps to reduce the required research time by selecting only the best sequences from a substantial number of viral proteins that are able to activate humoral and cellular immune responses against YFV.
Because simply binding to the MHC complex does not guarantee that these sequences are epitopes 77 , only those that met the previously defined criteria were used to construct our prototype, specifically.www.nature.com/scientificreports/

Physicochemical parameter prediction
The chemical formula of the vaccine sequence was predicted to be C 1213 H 1970 N 348 O 336 S 10 .The molecular weight was 27,125.72 kDa, the theoretical isoelectric point (pI) was 9.87, the instability index was 27.87, the aliphatic index was 100.42, and GRAVY was 0.315.These results indicate the preparation of a basic, stable (cut-off point < 40), thermostable, and hydrophobic sequence, which, with the exception of hydrophobicity, are favorable parameters for a vaccine.www.nature.com/scientificreports/

Design, refinement and validation of tertiary structure of the vaccine prototype
The amino acid sequence was uploaded into the Raptor X server, and the tertiary structure of the vaccine that is based on several epitopes was constructed as a consequence.Of the 264 residues, 264 (100%) were modeled, while 61 (23%) predicted positions were disordered.Absolute model quality is measured by total uGDT (GDT), which was predicted to be 94.For a protein with > 100 residues, uGD > 50 is a good indicator 78 .The relative quality of the model was also evaluated and a p-value of 1.559 × 10 −3 was obtained.All these results indicate a good overall tertiary model of the subunit vaccine, as the maximum number of residuals was modeled.Vaccine model refinement was performed using the GalaxyRefine2 server.The GalaxyRefine creates 5 refined models, and model 1 was selected as the final vaccine model for further analysis because it produced the best results, including GDT-HA (0.9725), RMSD (0.342), MolProbity (1.761), Clash score (9.5), Poor rotamers (0.6), and Rama favored (96.2) (Fig. 1).A higher GDT-HA value indicates superior overall model quality.The MolProbity score serves as a critical protein quality metric, amalgamating Clashscore, Rotamer, and Ramachandran scores into a normalized single score.RMSD provides insight into the average atom deviation between the refined and unrefined structure, and ideally, it should be minimal.A lower MolProbity value often correlates with the promotion of TH2 cytokines, fostering B-cell and antibody responses 47 .
To assess the structural and stereochemical integrity of the refined models, comprehensive all-atom structure validation analyses were conducted.These analyses encompassed the use of the Ramachandran plot, Z-score, ERRAT, Verify3D, and PROCHECK tools.Ramachandran plot revealed that 95.83% of the residues were in the most favorable ranges, 5.17% were in allowed ranges.There is no percentage of disallowed ranges.The Z-score of the model was estimated to be − 4.26, which is within the range of scores normally found for native proteins of similar size (Fig. 3A).As per the ERRAT assessment, the protein model exhibited an overall quality factor of 87.20% when compared to highly refined structures.Furthermore, in Verify3D, 89.02% of the residues achieved a 3D-1D score of ≥ 2 (as depicted in Fig. 3B, C), indicating a high level of compatibility with the benchmark models.This validation process confirmed the tertiary structure model's integrity, with no critical errors detected.
The results of the Molecular Dynamics (MD) simulations, portraying the stability and flexibility of the vaccine, are illustrated in Fig. 4. The root mean square deviation (RMSD) plot initially showed very little variation up to 15 ns in the range of 0.4-0.9nm, followed by a stable conformation up to 50 ns.This stability could be due to the higher number of stable bonds of the target protein.Then, the root mean square fluctuation (RMSF) value was calculated to investigate the structural flexibility of the backbone atoms of the protein.The results show that there are no large fluctuations and that the complex is flexible (RMSF ≤ 0.8 nm).MD simulations can be used prior to docking, as a set of "new" and broader protein conformations can be extracted from the processing of the resulting trajectory and used as targets for docking.These results suggest that the developed vaccines can strongly interact with immune receptors.

Molecular docking, refinement and comparative analysis
In the recognition of YFV by the host immune system, Toll-Like Receptor 2 (TLR-2) has been identified along with three other Toll-Like receptors (7, 8, and 9) as critical in the interactions between the 17D vaccine and human cells.According to Pulendran 47 , TLR-2 has the ability to induce both Th1 and Th2 cells and can indirectly facilitate either antibody production or a cytotoxic cellular response.Hence, the interaction between TLR-2 and the vaccine prototype was evaluated through protein-protein docking and was validated through a comprehensive structural validation protocol.
The binding of the refined vaccine model performed by the PatchDock server with TLR-2 resulted in 20 models ranked by a geometric complementarity score.The top 10 models were refined based on binding energy using FireDock (Fig. 2a).In the HADDOCK refinement process, a solvent shell was constructed around the top complexes, followed by a series of brief Molecular Dynamics (MD) simulations governed by the subsequent parameters: all atoms, excluding side-chain atoms at the interface, were restrained to their original positions.Subsequently, 1250 MD steps were executed at 300 K with position restraints applied to heavy atoms not participating in the Protein-Protein Interaction (PPI) (specifically, residues outside of intermolecular contacts within 5 Å).The systems were gradually cooled down through 1000 MD steps at 300, 200, and 100 K, while maintaining position restraints on the backbone atoms of the protein complex, except for those at the interface.In the final step, the optimal model was refined by applying the methodology of quantum mechanics/molecular mechanics (QM/MM).
In our research, we conducted a comparative analysis between structures derived from basic geometric optimization using classical mechanics, resulting in the 'PatchDock + FireDock + HADDOCK' (PFH) structure, and those obtained from advanced computational methods such as molecular dynamics (MD) simulations and quantum mechanics/molecular mechanics (QM/MM) techniques, which led to the 'PatchDock + FireDock + HADDOCK + MD + QM/MM' (PFHMQM) structure.We utilized the PRODIGY tool to assess energy scores and binding affinity of both models, as shown in Fig. 2b.This comparison unveiled significant discrepancies, particularly in the RMSD and Gibbs free energy parameters.The RMSD for PFH (6.3 Å) was higher than that for PFHMQM (2.2 Å), and ΔG_PFHMQM (− 17.1 kcal/mol) was lower compared to ΔG_PFH (− 12.9 kcal/mol).These results emphasize the importance of sophisticated geometric optimization calculations that provide a more accurate and reliable depiction of the system's electronic cloud.Consequently, even subtle structural changes brought about by these calculations can significantly alter ligand binding to receptors.
In summary, the docking analysis revealed robust interactions between the vaccine molecules and immune cells.However, it's important to note that this assessment was theoretical, and a real evaluation of binding potency within the host is still needed.To validate the docking results, various techniques, including molecular dynamics and QM/MM simulations, were employed.The MD + QM/MM analysis predicted substantial binding stability, which is vital for ensuring the lasting effectiveness and durability of the vaccine construct within the host.

Codon adaptation and in silico cloning
Validation of the candidate vaccine construct necessitates immunoreactivity screening via serological analysis 32 .This involves the expression of the recombinant protein in Escherichia coli expression systems, as these systems are well-suited for the production of recombinant proteins 79,80 .Codon optimization performed to achieve high-level expression of our vaccine prototype (801 nucleotides) in E. coli (strain K12) shows that the codon adaptability index (1.0) and GC content (55.42%) were favorable for a high-level expression in bacteria.
For gene expression in an organism of interest, the ideal CAI value should be 1.00, but a value > 0.8 is also considered good and the percent GC content should be in the range of 30-70%.These results suggest a good expression of the genetically engineered vaccine in the E. coli K-12 strain.After evaluating these parameters and once the sequence was free of commercially available restriction sites, two restriction sites XhoI and BamHI were added to the N-and C-terminal, respectively, of the optimized reverse codon sequence.Ultimately, the recombinant plasmid was generated by incorporating the reverse sequence of the adapted codon into the pET-28a( +) vector.The chosen restriction sites for this insertion were PspXI and BamHI, which served as the starting and ending cut points, respectively (Fig. 6).

Immune simulation of the multiepitope vaccine
Although available immunoinformatic algorithms for in silico predictors of epitopes and potential vaccines demonstrate remarkable precision, one of the greatest hurdles in the field is to correctly stimulate the immune system 81 .Here, the immune response simulation results obtained using C-ImmSim revealed that the secondary immune response, overall, was significantly greater than the first response, coinciding with what occurs in vivo immune response (Fig. 7).
One of the key elements in the immune response to the Yellow Fever Virus (YFV) is the presence of neutralizing antibodies, primarily of the IgG type.These antibodies play a critical role in establishing a longlasting immune defense against YFV and are considered a significant gauge of immunity 70 .Following the administration of a peptide vaccine booster, there was a notable increase in the antibody response, accompanied by a simultaneous decrease in antigen levels.This was evident from the narrower base of the antigenic spike in comparison to the previous dosage.During this period, a predominant humoral response was observed, characterized by higher IgM production.Despite the increased IgM production, the booster resulted in higher quantities of IgG compared to the initial dose, indicating a degree of seroconversion (see Fig. 7A).
Analysis of B-cell population per cell showed overall B-cell memory responses higher after boost dose and which remains in greater proportion for more than 200 days (Fig. 7B).In Fig. 7C, it is evident that the B-cell population remained higher and stable over the same time periods.As for the T cells, after the prime dose, there was a simultaneous increase in effector T-helper (Th) cells, while Th-memory cells showed a lower response, but after the boost, Th-memory respond faster and higher as expected for an efficient immune response to antigenic challenge (Fig. 7D).At the same period, there was a predominance of Th active cells (Fig. 7E).There was an increase of active effect cytotoxic T-cells after prime dose and following around day 30 (Fig. 7G, H), which corresponds to the period of application of the second dose, explaining that there was a previous recognition of these antigens.
After the initial prime dose, there was an initial surge in the IFN-γ response, which is associated with both CD8 + T-cell and CD4 + Th1 response.Additionally, there was a notable response in terms of IL-10 and TGF-β cytokines, which are associated with the T-regulatory (T-reg) phenotype.After boost, IFN-γ also peaked and

Conclusion
In summary, we identified 15 epitopes for T and B cells in the structural and nonstructural proteins of 196 strains of YFV that have essential properties for eliciting an effective and population-wide immune response, namely high antigenicity and immunogenicity, extensive conservation among different strains, comprehensive population coverage, non-toxicity and non-allergenic in humans.On this basis, we have developed and optimized the spatial geometry of a prototype subunit vaccine with appropriate physicochemical properties, the ability to bind to the Toll-2 receptor, the potential to elicit an effective and durable immune response in two doses, and a coding region that can be successfully inserted into a cloning vector.Experimental studies (in vitro and in vivo) are required to validate the efficacy and safety of the proposed vaccine (Supplementary information 1).

Figure 1 .
Figure 1.A comprehensive schematic portrayal of the workflow employed in the present study.

9 Figure 2 .
Figure 2. Molecular docking and refinements.(A) Energies values obtained after refinement by FireDock.(B) Molecular Docking and Refinements.(A) Energy values obtained after refinement by FireDock.(B) Best 3D docking model acquired after docking in PatchDock and subsequent refinements in FireDock, HADDOCK, MD, and QM/MM (structure PFHMQM).At the bottom of the image, energetic scores of the structures following sequential stages of geometric refinement.

Figure 3 .
Figure 3. Validation of the final subunit vaccine model.(A) Vaccine 3D Structure Validation by ProSA-web illustrating Z-score; (B) Quality factor and quality score by ERRAT (C) Verify3D tools, respectively.

Figure 4 .
Figure 4. 3D structural conformation of the multi-epitope subunit vaccine after homology modeling and refinement by SwissModel and Galaxy Refine servers.

Table 1 .
List of predicted MHC-I (CTL) epitopes on YVF proteins, with the sequence of each peptide and its allele number, antigenicity score, allergenicity, immunogenicity, toxicity, and conservation.

Table 2 .
List of predicted MHC-I (CTL) epitopes on YVF E protein, with the sequence of each peptide and its allele number, antigenicity score, allergenicity, immunogenicity, toxicity, and conservation.

Table 3 .
List of predicted MHC-I (CTL) epitopes on YVF M protein, with the sequence of each peptide and its allele number, antigenicity score, allergenicity, immunogenicity, toxicity, and conservation.

Table 5 .
List of predicted MHC-I (CTL) epitopes on YVF NS1 protein, with the sequence of each peptide and its allele number, antigenicity score, allergenicity, immunogenicity, toxicity, and conservation.

Table 6 .
List of predicted MHC-I (CTL) epitopes on YVF NS2A protein, with the sequence of each peptide and its allele number, antigenicity score, allergenicity, immunogenicity, toxicity, and conservation.

Table 7 .
List of predicted MHC-I (CTL) epitopes on YVF NS2B protein, with the sequence of each peptide and its allele number, antigenicity, score, allergenicity, immunogenicity, toxicity, and conservation.

Table 8 .
List of predicted MHC-I (CTL) epitopes on YVF NS3 protein, with the sequence of each peptide and its allele number, antigenicity score, allergenicity, immunogenicity, toxicity, and conservation.

Table 9 .
List of predicted MHC-I (CTL) epitopes on YVF NS4A protein, with the sequence of each peptide and its allele number, antigenicity score, allergenicity, immunogenicity, toxicity, and conservation.

Table 10 .
List of predicted MHC-I (CTL) epitopes on YVF NS4B protein, with the sequence of each peptide and its allele number, antigenicity score, allergenicity, immunogenicity, toxicity, and conservation.

Table 11 .
List of predicted MHC-I (CTL) epitopes on YVF NS5 protein, with the sequence of each peptide and its allele number, antigenicity score, allergenicity, immunogenicity, toxicity, and conservation.

Table 12 .
List of predicted MHC-II (HTL) epitopes in YVF proteins, with the sequence of each peptide and its allele number, antigenicity prediction score, allergenicity, toxicity and conservation.

Table 14 .
Population coverage of the final CTL and HTL epitopes for the principal endemic regions of yellow fever-Africa and South America.